getHdata(pbc)
# Have upData move units from labels to separate attribute
pbc <- upData(pbc,
fu.yrs = fu.days / 365.25,
labels = c(fu.yrs = 'Follow-up Time',
status = 'Death or Liver Transplantation'),
units = c(fu.yrs = 'year'),
drop = 'fu.days',
moveUnits=TRUE, html=TRUE)
Input object size: 80952 bytes; 19 variables 418 observations
Label for bili changed from Serum Bilirubin (mg/dl) to Serum Bilirubin
units set to mg/dl
Label for albumin changed from Albumin (gm/dl) to Albumin
units set to gm/dl
Label for protime changed from Prothrombin Time (sec.) to Prothrombin Time
units set to sec.
Label for alk.phos changed from Alkaline Phosphatase (U/liter) to Alkaline Phosphatase
units set to U/liter
Label for sgot changed from SGOT (U/ml) to SGOT
units set to U/ml
Label for chol changed from Cholesterol (mg/dl) to Cholesterol
units set to mg/dl
Label for trig changed from Triglycerides (mg/dl) to Triglycerides
units set to mg/dl
Label for platelet changed from Platelets (per cm^3/1000) to Platelets
units set to per cm^3/1000
Label for copper changed from Urine Copper (ug/day) to Urine Copper
units set to ug/day
Added variable fu.yrs
Dropped variable fu.days
New object size: 84744 bytes; 19 variables 418 observations
# The following can also be done by running this command
# to put the results in a new browser tab:
# getHdata(pbc, 'contents')
html(contents(pbc), maxlevels=10, levelType='table')
| Name | Labels | Units | Levels | Storage | NAs |
|---|---|---|---|---|---|
| bili | Serum Bilirubin | mg/dl | double | 0 | |
| albumin | Albumin | gm/dl | double | 0 | |
| stage | Histologic Stage, Ludwig Criteria | integer | 6 | ||
| protime | Prothrombin Time | sec. | double | 2 | |
| sex | 2 | integer | 0 | ||
| age | Age | double | 0 | ||
| spiders | 2 | integer | 106 | ||
| hepatom | 2 | integer | 106 | ||
| ascites | 2 | integer | 106 | ||
| alk.phos | Alkaline Phosphatase | U/liter | double | 106 | |
| sgot | SGOT | U/ml | double | 106 | |
| chol | Cholesterol | mg/dl | integer | 134 | |
| trig | Triglycerides | mg/dl | integer | 136 | |
| platelet | Platelets | per cm^3/1000 | integer | 110 | |
| drug | 3 | integer | 0 | ||
| status | Death or Liver Transplantation | integer | 0 | ||
| edema | 3 | integer | 0 | ||
| copper | Urine Copper | ug/day | integer | 108 | |
| fu.yrs | Follow-up Time | year | double | 0 |
| Variable | Levels |
|---|---|
| sex | male |
| female | |
| spiders, hepatom | absent |
| ascites | present |
| drug | D-penicillamine |
| placebo | |
| not randomized | |
| edema | no edema |
| edema, no diuretic therapy | |
| edema despite diuretic therapy |
# did have results='asis' above
d <- describe(pbc)
html(d, size=80, scroll=TRUE)
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 418 | 0 | 98 | 0.998 | 3.221 | 3.742 | 0.50 | 0.60 | 0.80 | 1.40 | 3.40 | 8.03 | 14.00 |
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 418 | 0 | 154 | 1 | 3.497 | 0.473 | 2.750 | 2.967 | 3.243 | 3.530 | 3.770 | 4.010 | 4.141 |
| n | missing | distinct | Info | Mean | Gmd |
|---|---|---|---|---|---|
| 412 | 6 | 4 | 0.893 | 3.024 | 0.9519 |
Value 1 2 3 4 Frequency 21 92 155 144 Proportion 0.051 0.223 0.376 0.350
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 416 | 2 | 48 | 0.998 | 10.73 | 1.029 | 9.60 | 9.80 | 10.00 | 10.60 | 11.10 | 12.00 | 12.45 |
| n | missing | distinct |
|---|---|---|
| 418 | 0 | 2 |
Value male female Frequency 44 374 Proportion 0.105 0.895
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 418 | 0 | 345 | 1 | 50.74 | 11.96 | 33.84 | 36.37 | 42.83 | 51.00 | 58.24 | 64.30 | 67.92 |
| n | missing | distinct |
|---|---|---|
| 312 | 106 | 2 |
Value absent present Frequency 222 90 Proportion 0.712 0.288
| n | missing | distinct |
|---|---|---|
| 312 | 106 | 2 |
Value absent present Frequency 152 160 Proportion 0.487 0.513
| n | missing | distinct |
|---|---|---|
| 312 | 106 | 2 |
Value absent present Frequency 288 24 Proportion 0.923 0.077
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 312 | 106 | 295 | 1 | 1983 | 1760 | 599.6 | 663.0 | 871.5 | 1259.0 | 1980.0 | 3826.4 | 6669.9 |
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 312 | 106 | 179 | 1 | 122.6 | 60.45 | 54.25 | 60.45 | 80.60 | 114.70 | 151.90 | 196.47 | 219.25 |
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 284 | 134 | 201 | 1 | 369.5 | 194.5 | 188.4 | 213.6 | 249.5 | 309.5 | 400.0 | 560.8 | 674.0 |
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 282 | 136 | 146 | 1 | 124.7 | 64.07 | 56.00 | 63.10 | 84.25 | 108.00 | 151.00 | 195.00 | 230.95 |
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 308 | 110 | 210 | 1 | 261.9 | 107.8 | 117.7 | 139.7 | 199.8 | 257.0 | 322.5 | 386.5 | 430.6 |
| n | missing | distinct |
|---|---|---|
| 418 | 0 | 3 |
Value D-penicillamine placebo not randomized Frequency 154 158 106 Proportion 0.368 0.378 0.254
| n | missing | distinct | Info | Sum | Mean | Gmd |
|---|---|---|---|---|---|---|
| 418 | 0 | 2 | 0.71 | 161 | 0.3852 | 0.4748 |
| n | missing | distinct |
|---|---|---|
| 418 | 0 | 3 |
Value no edema edema, no diuretic therapy
Frequency 354 44
Proportion 0.847 0.105
Value edema despite diuretic therapy
Frequency 20
Proportion 0.048
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 310 | 108 | 158 | 1 | 97.65 | 83.16 | 17.45 | 24.00 | 41.25 | 73.00 | 123.00 | 208.10 | 249.20 |
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 418 | 0 | 399 | 1 | 5.251 | 3.429 | 0.671 | 1.661 | 2.992 | 4.736 | 7.155 | 9.649 | 11.063 |
| lowest : | 0.1122519 | 0.1177276 | 0.1396304 | 0.1943874 | 0.2108145 |
| highest: | 12.3203285 | 12.3449692 | 12.3832991 | 12.4736482 | 13.1279945 |
# prList is in Hmisc; useful for plotting or printing a list of objects
# Can just use plot(d) if don't care about the mess
# If using html output these 2 images would not be rendered no matter what
p <- plot(d)
# The option htmlfig=2 causes markupSpecs$html$cap() to be used to
# HTML-typeset as a figure caption and to put the sub-sub section
# marker ### in front of the caption. htmlfig is the only reason
# results='asis' was needed in the chunk header
# We define a long caption for one of the plots, which does not appear
# in the table of contents
# prList works for html notebooks but not html documents
# prList(p, lcap=c('', 'These are spike histograms'), htmlfig=2)
htmltools::tagList(p) # lapply(p, plotly::as.widget)
s <- summaryM(bili + albumin + stage + protime + sex + age + spiders +
alk.phos + sgot + chol ~ drug, data=pbc,
overall=FALSE, test=TRUE)
plot(s, which='categorical')
Figure 1.1: Proportions and χ2 tests for categorical variables
plot(s, which='continuous', vars=1 : 4)
Figure 1.2: Extended box plots for the first 4 continuous variables
plot(s, which='continuous', vars=5 : 7)
Figure 1.3: Extended box plots for the remaining continuous variables
html(s, caption='Baseline characteristics by randomized treatment',
exclude1=TRUE, npct='both', digits=3, middle.bold=TRUE,
prmsd=TRUE, brmsd=TRUE, msdsize=mu$smaller2)
| Baseline characteristics by randomized treatment. | |||||
| N |
D-penicillamine N=154 |
placebo N=158 |
not randomized N=106 |
Test Statistic |
|
|---|---|---|---|---|---|
Serum Bilirubin mg/dl |
418 | 0.725 1.300 3.600 3.649 ± 5.282 |
0.800 1.400 3.200 2.873 ± 3.629 |
0.725 1.400 3.075 3.117 ± 4.043 |
F2 415=0.03, P=0.9721 |
Albumin gm/dl |
418 | 3.342 3.545 3.777 3.524 ± 0.396 |
3.212 3.565 3.830 3.516 ± 0.443 |
3.125 3.470 3.720 3.431 ± 0.435 |
F2 415=2.13, P=0.121 |
| Histologic Stage, Ludwig Criteria : 1 | 412 | 0.03 4⁄154 | 0.08 12⁄158 | 0.05 5⁄100 | χ26=5.33, P=0.5022 |
| 2 | 0.21 32⁄154 | 0.22 35⁄158 | 0.25 25⁄100 | ||
| 3 | 0.42 64⁄154 | 0.35 56⁄158 | 0.35 35⁄100 | ||
| 4 | 0.35 54⁄154 | 0.35 55⁄158 | 0.35 35⁄100 | ||
Prothrombin Time sec. |
416 | 10.000 10.600 11.400 10.800 ± 1.138 |
10.025 10.600 11.000 10.653 ± 0.851 |
10.100 10.600 11.000 10.750 ± 1.078 |
F2 413=0.23, P=0.7951 |
| sex : female | 418 | 0.90 139⁄154 | 0.87 137⁄158 | 0.92 98⁄106 | χ22=2.38, P=0.3042 |
| Age | 418 | 41.43 48.11 55.80 48.58 ± 9.96 |
42.98 51.93 58.90 51.42 ± 11.01 |
46.00 53.00 61.00 52.87 ± 9.78 |
F2 415=6.1, P=0.0021 |
| spiders | 312 | 0.29 45⁄154 | 0.28 45⁄158 | χ21=0.02, P=0.8852 | |
Alkaline Phosphatase U/liter |
312 | 922 1283 1950 1943 ± 2102 |
841 1214 2028 2021 ± 2183 |
F1 310=0.06, P=0.8123 | |
SGOT U/ml |
312 | 83.8 117.4 151.9 125.0 ± 58.9 |
76.7 111.6 151.5 120.2 ± 54.5 |
F1 310=0.55, P=0.463 | |
Cholesterol mg/dl |
284 | 254 304 377 374 ± 252 |
248 316 417 365 ± 210 |
F1 282=0.37, P=0.5453 | |
|
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. Tests used: 1Kruskal-Wallis test; 2Pearson test; 3Wilcoxon test . | |||||
require(rms)
getHdata(lead)
# Subset variables just so contents() and describe() output is short
# Override units of measurement to make them legal R expressions
lead <- upData(lead,
keep=c('ld72', 'ld73', 'age', 'maxfwt'),
labels=c(age='Age'),
units=c(age='years', ld72='mg/100*ml', ld73='mg/100*ml'))
Input object size: 53928 bytes; 39 variables 124 observations
Kept variables ld72,ld73,age,maxfwt
New object size: 14096 bytes; 4 variables 124 observations
To use Predict, summary, or nomogram in the rms package, you need to let rms first compute summaries of the distributional characteristics of the predictors
dd <- datadist(lead); options(datadist='dd')
dd # show what datadist computed
age ld72 ld73 maxfwt
Low:effect 6.166667 27.00 24.00 46.0
Adjust to 8.375000 34.00 30.50 52.0
High:effect 12.020833 43.00 37.00 59.0
Low:prediction 3.929167 18.00 18.15 33.2
High:prediction 15.000000 61.85 50.85 72.2
Low 3.750000 1.00 15.00 13.0
High 15.916667 99.00 58.00 84.0
Fit an ordinary least squares regrssion model
# Fit an ordinary linear regression model with 3 predictors assumed linear
f <- ols(maxfwt ~ age + ld72 + ld73, data=lead)
f$coef
Intercept age ld72 ld73
34.1058551 2.6078450 -0.0245978 -0.2389695
Unofrtunately, as far as I can tell there is no nice way to display the regression table in HTML. This oucld be done manually
beta = coef(olsfit). se = sqrt(diag(vcov(olsfit))) p = 1 - pchisq(Z^2, 1)
To get t-statistic pvalues need to know the df … Z <- beta/se p = 2 * (1 - pt(abs(Z), errordf))
Another approach is to use print.capture see https://stackoverflow.com/questions/47724189/extract-all-model-statistics-from-rms-fits
#parser
get_model_stats = function(x) {
cap = capture.output(print(x))
#model stats
stats = c()
stats$R2.adj = str_match(cap, "R2 adj\\s+ (\\d\\.\\d+)") %>% na.omit() %>% .[, 2] %>% as.numeric()
#coef stats lines
coef_lines = cap[which(str_detect(cap, "Coef\\s+S\\.E\\.")):(length(cap) - 1)]
#parse
coef_lines_table = suppressWarnings(readr::read_table(coef_lines %>% stringr::str_c(collapse = "\n")))
colnames(coef_lines_table)[1] = "Predictor"
list(
stats = stats,
coefs = coef_lines_table
)
}
get_model_stats(f)
$stats
$stats$R2.adj
[1] 0.45
$coefs
# A tibble: 4 x 5
Predictor Coef S.E. t `Pr(>|t|)`
<chr> <dbl> <dbl> <dbl> <chr>
1 Intercept 34.1 4.84 7.04 <0.0001
2 age 2.61 0.323 8.07 <0.0001
3 ld72 -0.0246 0.0782 -0.31 0.7538
4 ld73 -0.239 0.132 -1.8 0.0744
Residual plots:
r <- resid(f)
par(mfrow=c(2,2)) # 2x2 matrix of plots
plot(fitted(f), r); abline(h=0) # yhat vs. r
with(lead, plot(age, r)); abline(h=0)
with(lead, plot(ld73, r)); abline(h=0)
qqnorm(r) # linearity indicates normality
qqline(as.numeric(r))
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
locale: LC_CTYPE=en_AU.utf8, LC_NUMERIC=C, LC_TIME=en_AU.UTF-8, LC_COLLATE=en_AU.utf8, LC_MONETARY=en_AU.UTF-8, LC_MESSAGES=en_AU.utf8, LC_PAPER=en_AU.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_AU.UTF-8 and LC_IDENTIFICATION=C
attached base packages:
other attached packages:
loaded via a namespace (and not attached):